Overview

Dataset statistics

Number of variables18
Number of observations115584
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory15.9 MiB
Average record size in memory144.0 B

Variable types

Categorical9
Numeric9

Alerts

accident_year has constant value "2020" Constant
accident_index has a high cardinality: 91199 distinct values High cardinality
accident_reference has a high cardinality: 91199 distinct values High cardinality
casualty_reference is highly correlated with car_passengerHigh correlation
casualty_class is highly correlated with pedestrian_location and 2 other fieldsHigh correlation
age_of_casualty is highly correlated with age_band_of_casualtyHigh correlation
age_band_of_casualty is highly correlated with age_of_casualtyHigh correlation
pedestrian_location is highly correlated with casualty_class and 2 other fieldsHigh correlation
pedestrian_movement is highly correlated with casualty_class and 2 other fieldsHigh correlation
car_passenger is highly correlated with casualty_reference and 1 other fieldsHigh correlation
casualty_type is highly correlated with pedestrian_location and 1 other fieldsHigh correlation
casualty_home_area_type is highly correlated with casualty_imd_decileHigh correlation
casualty_imd_decile is highly correlated with casualty_home_area_typeHigh correlation
casualty_class is highly correlated with pedestrian_location and 1 other fieldsHigh correlation
age_of_casualty is highly correlated with age_band_of_casualtyHigh correlation
age_band_of_casualty is highly correlated with age_of_casualtyHigh correlation
pedestrian_location is highly correlated with casualty_class and 1 other fieldsHigh correlation
pedestrian_movement is highly correlated with casualty_class and 1 other fieldsHigh correlation
casualty_home_area_type is highly correlated with casualty_imd_decileHigh correlation
casualty_imd_decile is highly correlated with casualty_home_area_typeHigh correlation
casualty_class is highly correlated with pedestrian_location and 1 other fieldsHigh correlation
age_of_casualty is highly correlated with age_band_of_casualtyHigh correlation
age_band_of_casualty is highly correlated with age_of_casualtyHigh correlation
pedestrian_location is highly correlated with casualty_class and 2 other fieldsHigh correlation
pedestrian_movement is highly correlated with casualty_class and 2 other fieldsHigh correlation
casualty_type is highly correlated with pedestrian_location and 1 other fieldsHigh correlation
casualty_class is highly correlated with car_passenger and 1 other fieldsHigh correlation
casualty_home_area_type is highly correlated with accident_yearHigh correlation
car_passenger is highly correlated with casualty_class and 1 other fieldsHigh correlation
sex_of_casualty is highly correlated with accident_yearHigh correlation
accident_year is highly correlated with casualty_class and 5 other fieldsHigh correlation
pedestrian_road_maintenance_worker is highly correlated with accident_yearHigh correlation
casualty_severity is highly correlated with accident_yearHigh correlation
casualty_class is highly correlated with pedestrian_location and 2 other fieldsHigh correlation
age_of_casualty is highly correlated with age_band_of_casualtyHigh correlation
age_band_of_casualty is highly correlated with age_of_casualtyHigh correlation
pedestrian_location is highly correlated with casualty_class and 1 other fieldsHigh correlation
pedestrian_movement is highly correlated with casualty_class and 1 other fieldsHigh correlation
car_passenger is highly correlated with casualty_classHigh correlation
casualty_home_area_type is highly correlated with casualty_imd_decileHigh correlation
casualty_imd_decile is highly correlated with casualty_home_area_typeHigh correlation
vehicle_reference is highly skewed (γ1 = 320.7263833) Skewed
casualty_reference is highly skewed (γ1 = 224.095606) Skewed
accident_index is uniformly distributed Uniform
accident_reference is uniformly distributed Uniform
pedestrian_location has 100834 (87.2%) zeros Zeros
pedestrian_movement has 100833 (87.2%) zeros Zeros
bus_or_coach_passenger has 114275 (98.9%) zeros Zeros
casualty_type has 14750 (12.8%) zeros Zeros

Reproduction

Analysis started2022-02-02 15:43:24.319545
Analysis finished2022-02-02 15:43:52.451845
Duration28.13 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

accident_index
Categorical

HIGH CARDINALITY
UNIFORM

Distinct91199
Distinct (%)78.9%
Missing0
Missing (%)0.0%
Memory size903.1 KiB
2020440349165
 
41
2020990939366
 
19
2020140924772
 
17
2020460977371
 
13
2020470916576
 
12
Other values (91194)
115482 

Length

Max length13
Median length13
Mean length13
Min length13

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique74161 ?
Unique (%)64.2%

Sample

1st row2020010219808
2nd row2020010220496
3rd row2020010220496
4th row2020010228005
5th row2020010228006

Common Values

ValueCountFrequency (%)
202044034916541
 
< 0.1%
202099093936619
 
< 0.1%
202014092477217
 
< 0.1%
202046097737113
 
< 0.1%
202047091657612
 
< 0.1%
202001023769211
 
< 0.1%
202006F17281711
 
< 0.1%
202035098420611
 
< 0.1%
2020170H1027011
 
< 0.1%
202016098475410
 
< 0.1%
Other values (91189)115428
99.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
202044034916541
 
< 0.1%
202099093936619
 
< 0.1%
202014092477217
 
< 0.1%
202046097737113
 
< 0.1%
202047091657612
 
< 0.1%
202001023769211
 
< 0.1%
202006f17281711
 
< 0.1%
202035098420611
 
< 0.1%
2020170h1027011
 
< 0.1%
202016098475410
 
< 0.1%
Other values (91189)115428
99.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

accident_year
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size903.1 KiB
2020
115584 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020
2nd row2020
3rd row2020
4th row2020
5th row2020

Common Values

ValueCountFrequency (%)
2020115584
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2020115584
100.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

accident_reference
Categorical

HIGH CARDINALITY
UNIFORM

Distinct91199
Distinct (%)78.9%
Missing0
Missing (%)0.0%
Memory size903.1 KiB
440349165
 
41
990939366
 
19
140924772
 
17
460977371
 
13
470916576
 
12
Other values (91194)
115482 

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique74161 ?
Unique (%)64.2%

Sample

1st row010219808
2nd row010220496
3rd row010220496
4th row010228005
5th row010228006

Common Values

ValueCountFrequency (%)
44034916541
 
< 0.1%
99093936619
 
< 0.1%
14092477217
 
< 0.1%
46097737113
 
< 0.1%
47091657612
 
< 0.1%
06F17281711
 
< 0.1%
01023769211
 
< 0.1%
35098420611
 
< 0.1%
170H1027011
 
< 0.1%
10096686110
 
< 0.1%
Other values (91189)115428
99.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
44034916541
 
< 0.1%
99093936619
 
< 0.1%
14092477217
 
< 0.1%
46097737113
 
< 0.1%
47091657612
 
< 0.1%
06f17281711
 
< 0.1%
01023769211
 
< 0.1%
35098420611
 
< 0.1%
170h1027011
 
< 0.1%
10096686110
 
< 0.1%
Other values (91189)115428
99.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

vehicle_reference
Real number (ℝ≥0)

SKEWED

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.460556824
Minimum1
Maximum999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size903.1 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile2
Maximum999
Range998
Interquartile range (IQR)1

Descriptive statistics

Standard deviation2.991765428
Coefficient of variation (CV)2.048373181
Kurtosis106936.656
Mean1.460556824
Median Absolute Deviation (MAD)0
Skewness320.7263833
Sum168817
Variance8.950660379
MonotonicityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
167571
58.5%
244589
38.6%
32846
 
2.5%
4436
 
0.4%
599
 
0.1%
625
 
< 0.1%
76
 
< 0.1%
85
 
< 0.1%
92
 
< 0.1%
102
 
< 0.1%
Other values (2)3
 
< 0.1%
ValueCountFrequency (%)
167571
58.5%
244589
38.6%
32846
 
2.5%
4436
 
0.4%
599
 
0.1%
625
 
< 0.1%
76
 
< 0.1%
85
 
< 0.1%
92
 
< 0.1%
102
 
< 0.1%
ValueCountFrequency (%)
9991
 
< 0.1%
112
 
< 0.1%
102
 
< 0.1%
92
 
< 0.1%
85
 
< 0.1%
76
 
< 0.1%
625
 
< 0.1%
599
 
0.1%
4436
 
0.4%
32846
2.5%

casualty_reference
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct43
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.347790352
Minimum1
Maximum992
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size903.1 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile3
Maximum992
Range991
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.036720714
Coefficient of variation (CV)2.995065745
Kurtosis52821.99629
Mean1.347790352
Median Absolute Deviation (MAD)0
Skewness224.095606
Sum155783
Variance16.29511412
MonotonicityNot monotonic
Histogram with fixed size bins (bins=43)
ValueCountFrequency (%)
190228
78.1%
217657
 
15.3%
35020
 
4.3%
41689
 
1.5%
5571
 
0.5%
6203
 
0.2%
783
 
0.1%
835
 
< 0.1%
919
 
< 0.1%
1013
 
< 0.1%
Other values (33)66
 
0.1%
ValueCountFrequency (%)
190228
78.1%
217657
 
15.3%
35020
 
4.3%
41689
 
1.5%
5571
 
0.5%
6203
 
0.2%
783
 
0.1%
835
 
< 0.1%
919
 
< 0.1%
1013
 
< 0.1%
ValueCountFrequency (%)
9921
< 0.1%
9021
< 0.1%
411
< 0.1%
402
< 0.1%
391
< 0.1%
381
< 0.1%
371
< 0.1%
361
< 0.1%
351
< 0.1%
341
< 0.1%

casualty_class
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size903.1 KiB
1
79330 
2
21504 
3
14750 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row3
3rd row3
4th row3
5th row3

Common Values

ValueCountFrequency (%)
179330
68.6%
221504
 
18.6%
314750
 
12.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
179330
68.6%
221504
 
18.6%
314750
 
12.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

sex_of_casualty
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size903.1 KiB
1
72335 
2
42488 
-1
 
756
9
 
5

Length

Max length2
Median length1
Mean length1.006540698
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row2
4th row1
5th row1

Common Values

ValueCountFrequency (%)
172335
62.6%
242488
36.8%
-1756
 
0.7%
95
 
< 0.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
173091
63.2%
242488
36.8%
95
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

age_of_casualty
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct101
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.48974772
Minimum-1
Maximum99
Zeros130
Zeros (%)0.1%
Negative2481
Negative (%)2.1%
Memory size903.1 KiB

Quantile statistics

Minimum-1
5-th percentile9
Q123
median33
Q350
95-th percentile72
Maximum99
Range100
Interquartile range (IQR)27

Descriptive statistics

Standard deviation18.98502214
Coefficient of variation (CV)0.5202837324
Kurtosis-0.2047937287
Mean36.48974772
Median Absolute Deviation (MAD)13
Skewness0.4471627493
Sum4217631
Variance360.4310656
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
303135
 
2.7%
192881
 
2.5%
202796
 
2.4%
252749
 
2.4%
262739
 
2.4%
282738
 
2.4%
222718
 
2.4%
232718
 
2.4%
242699
 
2.3%
182686
 
2.3%
Other values (91)87725
75.9%
ValueCountFrequency (%)
-12481
2.1%
0130
 
0.1%
1186
 
0.2%
2295
 
0.3%
3349
 
0.3%
4436
 
0.4%
5436
 
0.4%
6431
 
0.4%
7529
 
0.5%
8491
 
0.4%
ValueCountFrequency (%)
992
 
< 0.1%
987
 
< 0.1%
972
 
< 0.1%
9613
 
< 0.1%
9521
 
< 0.1%
9417
 
< 0.1%
9338
< 0.1%
9249
< 0.1%
9166
0.1%
9093
0.1%

age_band_of_casualty
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.292609704
Minimum-1
Maximum11
Zeros0
Zeros (%)0.0%
Negative2481
Negative (%)2.1%
Memory size903.1 KiB

Quantile statistics

Minimum-1
5-th percentile2
Q15
median6
Q38
95-th percentile10
Maximum11
Range12
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.392856268
Coefficient of variation (CV)0.3802645294
Kurtosis0.6772990663
Mean6.292609704
Median Absolute Deviation (MAD)2
Skewness-0.5028817423
Sum727325
Variance5.725761117
MonotonicityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
625511
22.1%
717805
15.4%
815669
13.6%
513568
11.7%
411627
10.1%
910390
9.0%
105337
 
4.6%
34740
 
4.1%
114025
 
3.5%
22599
 
2.2%
Other values (2)4313
 
3.7%
ValueCountFrequency (%)
-12481
 
2.1%
11832
 
1.6%
22599
 
2.2%
34740
 
4.1%
411627
10.1%
513568
11.7%
625511
22.1%
717805
15.4%
815669
13.6%
910390
9.0%
ValueCountFrequency (%)
114025
 
3.5%
105337
 
4.6%
910390
9.0%
815669
13.6%
717805
15.4%
625511
22.1%
513568
11.7%
411627
10.1%
34740
 
4.1%
22599
 
2.2%

casualty_severity
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size903.1 KiB
3
94022 
2
20102 
1
 
1460

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row3
3rd row3
4th row3
5th row2

Common Values

ValueCountFrequency (%)
394022
81.3%
220102
 
17.4%
11460
 
1.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
394022
81.3%
220102
 
17.4%
11460
 
1.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

pedestrian_location
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.6968611573
Minimum-1
Maximum10
Zeros100834
Zeros (%)87.2%
Negative2
Negative (%)< 0.1%
Memory size903.1 KiB

Quantile statistics

Minimum-1
5-th percentile0
Q10
median0
Q30
95-th percentile5
Maximum10
Range11
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.059929866
Coefficient of variation (CV)2.956011889
Kurtosis8.474022246
Mean0.6968611573
Median Absolute Deviation (MAD)0
Skewness3.047900729
Sum80546
Variance4.243311051
MonotonicityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
0100834
87.2%
55828
 
5.0%
12419
 
2.1%
61709
 
1.5%
91614
 
1.4%
101393
 
1.2%
4845
 
0.7%
8757
 
0.7%
791
 
0.1%
270
 
0.1%
Other values (2)24
 
< 0.1%
ValueCountFrequency (%)
-12
 
< 0.1%
0100834
87.2%
12419
 
2.1%
270
 
0.1%
322
 
< 0.1%
4845
 
0.7%
55828
 
5.0%
61709
 
1.5%
791
 
0.1%
8757
 
0.7%
ValueCountFrequency (%)
101393
 
1.2%
91614
 
1.4%
8757
 
0.7%
791
 
0.1%
61709
 
1.5%
55828
5.0%
4845
 
0.7%
322
 
< 0.1%
270
 
0.1%
12419
2.1%

pedestrian_movement
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5615915698
Minimum-1
Maximum9
Zeros100833
Zeros (%)87.2%
Negative2
Negative (%)< 0.1%
Memory size903.1 KiB

Quantile statistics

Minimum-1
5-th percentile0
Q10
median0
Q30
95-th percentile4
Maximum9
Range10
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.879680074
Coefficient of variation (CV)3.347058921
Kurtosis13.20460614
Mean0.5615915698
Median Absolute Deviation (MAD)0
Skewness3.749485504
Sum64911
Variance3.53319718
MonotonicityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
0100833
87.2%
14593
 
4.0%
94132
 
3.6%
33093
 
2.7%
5776
 
0.7%
2746
 
0.6%
4569
 
0.5%
8417
 
0.4%
7331
 
0.3%
692
 
0.1%
ValueCountFrequency (%)
-12
 
< 0.1%
0100833
87.2%
14593
 
4.0%
2746
 
0.6%
33093
 
2.7%
4569
 
0.5%
5776
 
0.7%
692
 
0.1%
7331
 
0.3%
8417
 
0.4%
ValueCountFrequency (%)
94132
 
3.6%
8417
 
0.4%
7331
 
0.3%
692
 
0.1%
5776
 
0.7%
4569
 
0.5%
33093
 
2.7%
2746
 
0.6%
14593
 
4.0%
0100833
87.2%

car_passenger
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size903.1 KiB
0
96655 
1
11958 
2
 
6543
-1
 
311
9
 
117

Length

Max length2
Median length1
Mean length1.002690684
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
096655
83.6%
111958
 
10.3%
26543
 
5.7%
-1311
 
0.3%
9117
 
0.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
096655
83.6%
112269
 
10.6%
26543
 
5.7%
9117
 
0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

bus_or_coach_passenger
Real number (ℝ)

ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.03895002769
Minimum-1
Maximum9
Zeros114275
Zeros (%)98.9%
Negative22
Negative (%)< 0.1%
Memory size903.1 KiB

Quantile statistics

Minimum-1
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum9
Range10
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.3815276336
Coefficient of variation (CV)9.795310974
Kurtosis112.1954802
Mean0.03895002769
Median Absolute Deviation (MAD)0
Skewness10.2245913
Sum4502
Variance0.1455633352
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0114275
98.9%
4796
 
0.7%
3350
 
0.3%
277
 
0.1%
155
 
< 0.1%
-122
 
< 0.1%
99
 
< 0.1%
ValueCountFrequency (%)
-122
 
< 0.1%
0114275
98.9%
155
 
< 0.1%
277
 
0.1%
3350
 
0.3%
4796
 
0.7%
99
 
< 0.1%
ValueCountFrequency (%)
99
 
< 0.1%
4796
 
0.7%
3350
 
0.3%
277
 
0.1%
155
 
< 0.1%
0114275
98.9%
-122
 
< 0.1%

pedestrian_road_maintenance_worker
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size903.1 KiB
0
114672 
2
 
745
-1
 
94
1
 
73

Length

Max length2
Median length1
Mean length1.000813261
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0114672
99.2%
2745
 
0.6%
-194
 
0.1%
173
 
0.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
0114672
99.2%
2745
 
0.6%
1167
 
0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

casualty_type
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.388366902
Minimum0
Maximum98
Zeros14750
Zeros (%)12.8%
Negative0
Negative (%)0.0%
Memory size903.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median9
Q39
95-th percentile11
Maximum98
Range98
Interquartile range (IQR)8

Descriptive statistics

Standard deviation9.914713615
Coefficient of variation (CV)1.341935741
Kurtosis56.13020795
Mean7.388366902
Median Absolute Deviation (MAD)0
Skewness6.75436421
Sum853977
Variance98.30154606
MonotonicityNot monotonic
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
962698
54.2%
116294
 
14.1%
014750
 
12.8%
36993
 
6.1%
53677
 
3.2%
193235
 
2.8%
41546
 
1.3%
111506
 
1.3%
81419
 
1.2%
21001
 
0.9%
Other values (11)2465
 
2.1%
ValueCountFrequency (%)
014750
 
12.8%
116294
 
14.1%
21001
 
0.9%
36993
 
6.1%
41546
 
1.3%
53677
 
3.2%
81419
 
1.2%
962698
54.2%
10138
 
0.1%
111506
 
1.3%
ValueCountFrequency (%)
98209
 
0.2%
97301
 
0.3%
90695
 
0.6%
2386
 
0.1%
22155
 
0.1%
21447
 
0.4%
20263
 
0.2%
193235
2.8%
183
 
< 0.1%
1782
 
0.1%

casualty_home_area_type
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size903.1 KiB
1
85122 
3
10860 
-1
10762 
2
8840 

Length

Max length2
Median length1
Mean length1.093109773
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
185122
73.6%
310860
 
9.4%
-110762
 
9.3%
28840
 
7.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
195884
83.0%
310860
 
9.4%
28840
 
7.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

casualty_imd_decile
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.36113995
Minimum-1
Maximum10
Zeros0
Zeros (%)0.0%
Negative10910
Negative (%)9.4%
Memory size903.1 KiB

Quantile statistics

Minimum-1
5-th percentile-1
Q12
median4
Q37
95-th percentile10
Maximum10
Range11
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.171409748
Coefficient of variation (CV)0.7271974263
Kurtosis-0.9571051199
Mean4.36113995
Median Absolute Deviation (MAD)2
Skewness0.08002709433
Sum504078
Variance10.05783979
MonotonicityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
213604
11.8%
112848
11.1%
312826
11.1%
411597
10.0%
-110910
9.4%
510863
9.4%
610225
8.8%
79255
8.0%
88438
7.3%
98068
7.0%
ValueCountFrequency (%)
-110910
9.4%
112848
11.1%
213604
11.8%
312826
11.1%
411597
10.0%
510863
9.4%
610225
8.8%
79255
8.0%
88438
7.3%
98068
7.0%
ValueCountFrequency (%)
106950
6.0%
98068
7.0%
88438
7.3%
79255
8.0%
610225
8.8%
510863
9.4%
411597
10.0%
312826
11.1%
213604
11.8%
112848
11.1%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

accident_indexaccident_yearaccident_referencevehicle_referencecasualty_referencecasualty_classsex_of_casualtyage_of_casualtyage_band_of_casualtycasualty_severitypedestrian_locationpedestrian_movementcar_passengerbus_or_coach_passengerpedestrian_road_maintenance_workercasualty_typecasualty_home_area_typecasualty_imd_decile
0202001021980820200102198081131316395000014
120200102204962020010220496113221311000012
220200102204962020010220496123241311000012
3202001022800520200102280051131235359000013
4202001022800620200102280061131478241000013
5202001022801120200102280111132326369000018
62020010228011202001022801112323363690000-1-1
7202001022801220200102280121111255300000914
8202001022801420200102280141111417300000913
9202001022801720200102280171131508299000013

Last rows

accident_indexaccident_yearaccident_referencevehicle_referencecasualty_referencecasualty_classsex_of_casualtyage_of_casualtyage_band_of_casualtycasualty_severitypedestrian_locationpedestrian_movementcar_passengerbus_or_coach_passengerpedestrian_road_maintenance_workercasualty_typecasualty_home_area_typecasualty_imd_decile
115574202099102388020209910238801132589351000014
115575202099102403920209910240392111528300000934
1155762020991024209202099102420911123363000009110
115577202099102420920209910242091222133300200918
1155782020991024526202099102452611316910369000037
115579202099102706420209910270642111113200000112
11558020209910295732020991029573113263931010000110
115581202099103029720209910302972111387200000529
1155822020991030900202099103090021117611300000119
115583202099103257520209910325751131488399000011